MSQ-Index: A Succinct Index for Fast Graph Similarity Search
نویسندگان
چکیده
Graph similarity search has received considerable attention in many applications, such as bioinformatics, data mining, pattern recognition, and social networks. Existing methods for this problem have limited scalability because of the huge amount of memory they consume when handling very large graph databases with millions or billions of graphs. In this paper, we study the problem of graph similarity search under the graph edit distance constraint. We present a space-efficient index structure based upon the q-gram tree that incorporates succinct data structures and hybrid encoding to achieve improved query time performance with minimal space usage. Specifically, the space usage of our index requires only 5%–15% of the previous state-of-the-art indexing size on the tested data while at the same time achieving 2–3 times acceleration in query time with small data sets. We also boost the query performance by augmenting the global filter with range search, which allows us to perform a query in a reduced region. In addition, we propose two effective filters that combine degree structures and label structures. Extensive experiments demonstrate that our proposed approach is superior in space and competitive in filtering to the state-of-the-art approaches. To the best of our knowledge, our index is the first in-memory index for this problem that successfully scales to cope with the large dataset of 25 million chemical structure graphs from the PubChem dataset.
منابع مشابه
Automatic graph construction of periodic open tubulene ((5,6,7)3) and computation of its Wiener, PI, and Szeged indices
The mathematical properties of nano molecules are an interesting branch of nanoscience for researches nowadays. The periodic open single wall tubulene is one of the nano molecules which is built up from two caps and a distancing nanotube/neck. We discuss how to automatically construct the graph of this molecule and plot the graph by spring layout algorithm in graphviz and netwrokx packages. The...
متن کاملScalable Similarity Search for Molecular Descriptors
Similarity search over chemical compound databases is a fundamental task in the discovery and design of novel drug-like molecules. Such databases often encode molecules as non-negative integer vectors, called molecular descriptors, which represent rich information on various molecular properties. While there exist efficient indexing structures for searching databases of binary vectors, solution...
متن کاملStructure and attribute index for approximate graph matching in large graphs
The increasing popularity of graph data in various domains has lead to a renewed interest in developing efficient graph matching techniques, especially for processing large graphs. In this paper, we study the problem of approximate graph matching in a large attributed graph. Given a large attributed graph and a query graph, we compute a subgraph of the large graph that best matches the query gr...
متن کاملFast image search on a VQ compressed image database
A fast and efficient image search method is developed for a compressed image database using vector quantization (VQ). An image search on an image database requires an exhaustive sequential scan of all the images, given the similarity measure. If compressed images are dealt with, images are decompressed as an initial operation and then the previously mentioned exhaustive search is performed usin...
متن کاملA simple alphabet-independent FM-index
We design a succinct full-text index based on the idea of Huffmancompressing the text and then applying the Burrows-Wheeler transform over it. The resulting structure can be searched as an FM-index, with the benefit of removing the sharp dependence on the alphabet size, σ, present in that structure. On a text of length n with zero-order entropy H0, our index needs O(n(H0 + 1)) bits of space, wi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1612.09155 شماره
صفحات -
تاریخ انتشار 2016